This document:

  1. Reads in the CTD files

  2. Does QAQC if necessary,

  3. Finds casts taken at the same reservoir on the same day for comparisons,

  4. Plot comparison casts for each date, variable, and reservoir

  5. Make a regression plot for each variable

  6. Run t-tests for each variable

  7. Plot and run Bland Altman Test

  8. Plot and run Passin Bablok Test

Comparison Plots from both CTDs split by date, variable, and reservoir

## Warning: Removed 9388 rows containing missing values (`geom_point()`).

## Warning: Removed 21649 rows containing missing values (`geom_point()`).

## Warning: Removed 21649 rows containing missing values (`geom_point()`).

## Warning: Removed 9399 rows containing missing values (`geom_point()`).

## Warning: Removed 9399 rows containing missing values (`geom_point()`).

## Warning: Removed 9716 rows containing missing values (`geom_point()`).

## Warning: Removed 9716 rows containing missing values (`geom_point()`).

## Warning: Removed 40976 rows containing missing values (`geom_point()`).

## Warning: Removed 40976 rows containing missing values (`geom_point()`).

## Warning: Removed 1646 rows containing missing values (`geom_point()`).

## Warning: Removed 40367 rows containing missing values (`geom_point()`).

## Warning: Removed 40442 rows containing missing values (`geom_point()`).

## Warning: Removed 40420 rows containing missing values (`geom_point()`).

## Warning: Removed 2306 rows containing missing values (`geom_point()`).

## Warning: Removed 12908 rows containing missing values (`geom_point()`).

## Warning: Removed 12908 rows containing missing values (`geom_point()`).

## Warning: Removed 2350 rows containing missing values (`geom_point()`).

## Warning: Removed 2350 rows containing missing values (`geom_point()`).

## Warning: Removed 785 rows containing missing values (`geom_point()`).

## Warning: Removed 785 rows containing missing values (`geom_point()`).

## Warning: Removed 18652 rows containing missing values (`geom_point()`).

## Warning: Removed 18652 rows containing missing values (`geom_point()`).

## Warning: Removed 21043 rows containing missing values (`geom_point()`).

## Warning: Removed 21055 rows containing missing values (`geom_point()`).

## Warning: Removed 21057 rows containing missing values (`geom_point()`).

## Warning: Removed 15 rows containing missing values (`geom_point()`).

## Warning: Removed 15 rows containing missing values (`geom_point()`).

## Warning: Removed 1520 rows containing missing values (`geom_point()`).

## Warning: Removed 1520 rows containing missing values (`geom_point()`).

## Warning: Removed 2615 rows containing missing values (`geom_point()`).

## Warning: Removed 2615 rows containing missing values (`geom_point()`).

## Warning: Removed 3010 rows containing missing values (`geom_point()`).

## Warning: Removed 3175 rows containing missing values (`geom_point()`).

## Warning: Removed 3022 rows containing missing values (`geom_point()`).

Scatterplots of the old CTD vs. new CTD

## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `==...`.
## Caused by warning in `variable == c("Temp_C", "DO_mgL", "DOsat_percent", "Cond_uScm", "SpCond_uScm",
##     "Chla_ugL", "Turbidity_NTU", "PAR_umolm2s")`:
## ! longer object length is not a multiple of shorter object length
## `summarise()` has grouped output by 'Reservoir', 'Site', 'Date', 'SN',
## 'variable'. You can override using the `.groups` argument.
## Warning: Removed 1950 rows containing missing values (`geom_point()`).

Notes from Bobbie’s R Script comparing methods in the Analytical Lab

R Script to compare methods in the Analytical Lab Authors: Bobbie Niederlehner Last Edited: 06/07/2022

Steps 1) load the libraries and the data 2) remove lines with missing values for either analysis 3) OPTIONAL: restrict the concentration range being analyzed 4) Paired T-Test- Test differences = 0 with paired T (NOT a preferred way to look at these data but magnitude of difference is interesting) 5) Bland Altman - Visualize differences over concentration (difference over mean concentration) 6) Passing Bablok - Test coincidence across concentrations - regression of new vs reference method. Does it have a slope of 1 and an intercept of 0

BACKGROUND INFO: Some web resources explaining method comparison regressions https://www.r-bloggers.com/deming-and-passing-bablok-regression-in-r/ https://www.r-bloggers.com/2015/09/deming-and-passing-bablok-regression-in-r/ http://labrtorian.com/tag/passing-bablok/ https://www.r-bloggers.com/2016/08/a-shiny-app-for-passing-bablok-and-deming-regression/ https://bahar.shinyapps.io/method_compare/

Notes NOTE: the order of the variables in each analysis can matter FOR paired T-test it doesn’t matter. Just pay attention to sign for difference. Difference is calculated as 1st specified (x) minus 2nd specified (y) FOR Bland Altman it doesn’t matter, Default difference is 1st specified (x) minus 2nd specified (y) FOR Passing Bablok first listed is reference method (x, Method1, or current), second listed is is test method (y, Method2, or new) The intercept is often smean difference, but will have the reverse sign.

Output from t-tests

## [1] "Chla_ugL"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = 20.519, df = 808, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  1.500423 1.817859
## sample estimates:
## mean difference 
##        1.659141 
## 
## [1] "Cond_uScm"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = 2.4397, df = 789, p-value = 0.01492
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.2213863 2.0444574
## sample estimates:
## mean difference 
##        1.132922 
## 
## [1] "DO_mgL"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = -7.5737, df = 425, p-value = 2.284e-13
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.6395939 -0.3760177
## sample estimates:
## mean difference 
##      -0.5078058 
## 
## [1] "DOsat_percent"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = -6.9338, df = 425, p-value = 1.531e-11
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -5.718669 -3.192554
## sample estimates:
## mean difference 
##       -4.455611 
## 
## [1] "PAR_umolm2s"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = -3.0679, df = 1017, p-value = 0.002212
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -21.81894  -4.79576
## sample estimates:
## mean difference 
##       -13.30735 
## 
## [1] "SpCond_uScm"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = 2.3825, df = 789, p-value = 0.01743
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.2679938 2.7761289
## sample estimates:
## mean difference 
##        1.522061 
## 
## [1] "Temp_C"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = 0.16491, df = 790, p-value = 0.8691
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.02421514  0.02865686
## sample estimates:
## mean difference 
##     0.002220858 
## 
## [1] "Turbidity_NTU"
## 
##  Paired t-test
## 
## data:  a$obs_S7809 and a$obs_S8188
## t = -0.97952, df = 808, p-value = 0.3276
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.354717  0.786967
## sample estimates:
## mean difference 
##      -0.7838751

Plots and outputs from Bland Altman

#  Bland Altman Plots (difference over mean)
#      Estimates an agreement interval, within which 95% of the differences fall. 
#      BUT criterion for agreement should be decided on apriori and is completely based on judgement - what can you tolerate?

for(b in 1:length(var)){
  a <-round_CTD%>%
    filter(variable==var[b])
  
  print(var[b])
  
   ba<-bland.altman.stats(a$obs_S7809,a$obs_S8188, conf.int = 0.95)
#}

print (c("mean difference", "lower limit", "upper limit","sample size"))
print (c(ba$mean.diffs, ba$lower.limit, ba$upper.limit, ba$based.on))

#     IF you prefer GGPLOT2 specify that graph.sys  
#         geom_count = TRUE handles overlapping points by making the plot symbol larger
ba2 <- bland.altman.plot(a$obs_S7809,a$obs_S8188, graph.sys = "ggplot2", conf.int=.95, geom_count = F)
print (ba2 +
         xlab("Average of Two Measurments") + 
         ylab ("Difference of Two Measurments") +
         ggtitle(paste0(var[b]," Bland Altman plot")) +
         theme_bw()+
          theme(plot.title = element_text(hjust = 0.5)))
}
## [1] "Chla_ugL"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]   1.659141  -2.848578   6.166860 809.000000

## [1] "Cond_uScm"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]   1.132922 -24.448720  26.714564 790.000000

## [1] "DO_mgL"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]  -0.5078058  -3.2201849   2.2045733 426.0000000

## [1] "DOsat_percent"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]  -4.455611 -30.451057  21.539834 426.000000

## [1] "PAR_umolm2s"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]  -13.30735 -284.56092  257.94622 1018.00000

## [1] "SpCond_uScm"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]   1.522061 -33.672514  36.716636 790.000000

## [1] "Temp_C"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]   0.002220858  -0.740158891   0.744600607 791.000000000

## [1] "Turbidity_NTU"
## [1] "mean difference" "lower limit"     "upper limit"     "sample size"    
## [1]  -0.7838751 -45.3971128  43.8293627 809.0000000

Passin Bablok Test and Output

Method Comparison Regression using nonparametric Passing Bablok (MORE RESISTANT TO OUTLIERS) The first listed variable definitely ends up on the X axis and documentation says it is the “reference” method The second listed variable ends up on the y axis and documentation says it is the “test” method

## [1] "Chla_ugL"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 809
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                  EST SE        LCI        UCI
## Intercept -0.1517648 NA -0.1970689 -0.1201603
## Slope      0.7429368 NA  0.7315125  0.7543014
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean     bias bootstrap.se
## Intercept   -0.15176       -0.15248 -0.00071      0.01895
## Slope        0.74294        0.74298  0.00004      0.00592
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"

## [1] "Cond_uScm"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 790
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                 EST SE       LCI       UCI
## Intercept 0.6949853 NA 0.5745776 0.8651358
## Slope     0.9806406 NA 0.9745454 0.9847817
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean     bias bootstrap.se
## Intercept    0.69499        0.70147  0.00648      0.07554
## Slope        0.98064        0.98028 -0.00036      0.00271
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"

## [1] "DO_mgL"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 426
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                  EST SE         LCI       UCI
## Intercept 0.01879691 NA -0.01224796 0.0337898
## Slope     1.14044814 NA  1.07834127 1.1821948
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean     bias bootstrap.se
## Intercept    0.01880        0.01794 -0.00086      0.01232
## Slope        1.14045        1.13537 -0.00508      0.02749
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"

## [1] "DOsat_percent"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 426
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                 EST SE        LCI       UCI
## Intercept 0.2662422 NA 0.06865347 0.3889467
## Slope     1.1109153 NA 1.04714755 1.1622111
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean     bias bootstrap.se
## Intercept    0.26624        0.26768  0.00144      0.07184
## Slope        1.11092        1.10675 -0.00416      0.03024
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"

## [1] "PAR_umolm2s"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 1018
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                    EST SE          LCI          UCI
## Intercept -0.005063738 NA -0.008482556 -0.002351462
## Slope      0.979424973 NA  0.957476818  1.007061753
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean     bias bootstrap.se
## Intercept   -0.00506       -0.00513 -0.00007      0.00160
## Slope        0.97942        0.97998  0.00055      0.01288
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"

## [1] "SpCond_uScm"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 790
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                 EST SE       LCI       UCI
## Intercept 0.7098330 NA 0.5260216 0.8726610
## Slope     0.9860639 NA 0.9817464 0.9911673
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean     bias bootstrap.se
## Intercept    0.70983        0.70843 -0.00141      0.09117
## Slope        0.98606        0.98608  0.00002      0.00244
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"

## [1] "Temp_C"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 791
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                   EST SE         LCI        UCI
## Intercept 0.007606034 NA -0.01213847 0.02907899
## Slope     0.999120416 NA  0.99715429 1.00090233
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean   bias bootstrap.se
## Intercept    0.00761        0.00811  5e-04      0.01069
## Slope        0.99912        0.99907 -5e-05      0.00098
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"

## [1] "Turbidity_NTU"
## [1] "Passing Bablok"
## 
## 
## ------------------------------------------
## 
## Reference method: obs_S7809
## Test method:     obs_S8188
## Number of data points: 809
## 
## ------------------------------------------
## 
## The confidence intervals are calculated with
##  bootstrap  ( quantile ) method.
## Confidence level: 95%
## 
## 
## ------------------------------------------
## 
## PASSING BABLOK REGRESSION FIT:
## 
##                  EST SE        LCI        UCI
## Intercept -0.3725648 NA -0.5865577 -0.2518306
## Slope      0.9621016 NA  0.9455358  0.9776715
## 
## 
## ------------------------------------------
## 
## BOOTSTRAP SUMMARY
## 
##           global.est bootstrap.mean     bias bootstrap.se
## Intercept   -0.37256        -0.3802 -0.00763      0.08774
## Slope        0.96210         0.9616 -0.00050      0.00851
## 
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"